class: title-slide, left, bottom # Introduction to R and Rstudio ---- ## **Session 1** ### ### ??? Wait 5 mins prior to start Hello and Welcome to NHSR-Communities Intro to R-Training. My name is..... introduce self introduce Ann - Ann is the SEA coordinator, not an R expert, but here to help co-ordinate the teams room, keeping an eye on chat in case there are problems I don't notice because I'm presenting. We'll do a round of introductions now, if people can switch their cams on that's great, don't need it on through the course but it is a bit nicer for us to see people, you can always switch it off if you need to answer the door, etc. I'll run through list rather than ask people to nominate each other, so if people can introduce themselves, say a little bit about their role, and what previous experience they've had, if any with R. Thank all. You will have got some pre-instructions about accessing Rstudio or Rstudio cloud. We'll come onto this in a bit, but you've got a bit of grace before actual code writing, so don't panic if you've not got everything setup already. What we're going to start with is a bit of info about the course, what's possible with R, before we get into using R and RStudio Housekeeping: keep cams on if you're comfortable with that, mute mic if there's background noise, if you need to pop out at any point that's absolutely fine. I have a lot of windows open, so if you have a question you can ask in the chat I'll try to pick it up, otherwise you're welcome to shout it out as we go through, wait til a break, whatever works for you. Will share my screen now to get going proper - is this zoomed in enough for people to see? --- class: center, middle # Agenda Using R Studio Importing data --------- Break --------- Introduction to ggplot2 What does this function do? --------- Lunch ----------- Data wrangling with dplyr Naming objects | Relational data --------- Break --------- R Markdown Ongoing learning .green[Finish about 4 - 430pm] ??? The agenda for the day looks like this. It is a long day, but we will aim to finish, certainly before half 4, and ideally maybe a bit closer to 4. We'll have a couple of breaks either side of lunch as well to get a beverage of your choice in. --- class: center, middle # Course Aims ####1. To show you some of the possibilities: ####2. To give you a feel for how R works. ####3. To show you enough for you to begin teaching yourself .blue[(Excellent free resources available)] ??? So the aim of this course is that it acts as a gentle introduction and essentially gets you to the point of being confident enough to start exploring R yourself. We're not going to cover everything you can do with R in a day, so we're just going to cover the basics and offer some pointers on where to go if you're interested in learning more. --- class: inverse, middle, center .left-col[.center[ ### elegant # Graphics ]] ??? To start with, you might be at the point of wondering what's possible with R - I'll just introduce some examples here of the types of things that are possible and the topics we're going to cover One of the great things R does well is visualising data, in a much more flexible way than you'd get from say MS Excel. --- [<img class="center" src="data:image/png;base64,#img/session01/heatmap_colin_angus.PNG"/>](https://github.com/VictimOfMaths/COVID-19) ??? Here's an example showing the progression of COVID-19 cases split by local authority, a heatmap through time --- The original blog is no longer available that featured this <img class="center" src="data:image/png;base64,#img/session01/london_cycle_routes.PNG" width="90%"/> ??? It's also possible to do interesting geospatial visualisations, like this map of frequent london cycle hire journeys --- class: center, middle # Collaboration <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M413.1 222.5l22.2 22.2c9.4 9.4 9.4 24.6 0 33.9L241 473c-9.4 9.4-24.6 9.4-33.9 0L12.7 278.6c-9.4-9.4-9.4-24.6 0-33.9l22.2-22.2c9.5-9.5 25-9.3 34.3.4L184 343.4V56c0-13.3 10.7-24 24-24h32c13.3 0 24 10.7 24 24v287.4l114.8-120.5c9.3-9.8 24.8-10 34.3-.4z"></path></svg> ## Reproducibility <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M413.1 222.5l22.2 22.2c9.4 9.4 9.4 24.6 0 33.9L241 473c-9.4 9.4-24.6 9.4-33.9 0L12.7 278.6c-9.4-9.4-9.4-24.6 0-33.9l22.2-22.2c9.5-9.5 25-9.3 34.3.4L184 343.4V56c0-13.3 10.7-24 24-24h32c13.3 0 24 10.7 24 24v287.4l114.8-120.5c9.3-9.8 24.8-10 34.3-.4z"></path></svg> ### R Markdown ??? Something else to mention. Doing analysis and visualisation in R allows a number of benefits that can make your workflow better: Firstly, you can share R code amongst your team so everyone can work on a project at once Using code helps make your analysis reproducible, so if you need to re-run an analysis it's really easy to do so, and if anyone wants to verify it, they can And R also can be used, with something called RMarkdown, to generate reports, documents etc on demand. So if you have a set report that you create periodically, using RMarkdown can save you time rather than updating word files all the time --- <img class="center" src="data:image/png;base64,#img/session01/automated_reports.PNG"/> ??? You can also use R Markdown to generate powerpoint compatible slides, for example, --- # (Interactive) Dashboards [Mental Health Surge Modelling](https://strategyunit.shinyapps.io/MH_Surge_Modelling/) <img src="data:image/png;base64,#img/session01/mental_health_modelling.PNG"/> ??? You can also use something called R Shiny to create dashboards that people can use - here's one example from the strategy unit on mental health: --- [Trafford Data Lab](https://trafforddatalab.shinyapps.io/trafford-tweet-dash/) and the main [site](https://www.trafforddatalab.io/) <img src="data:image/png;base64,#img/session01/twitter_dash.PNG"/> ??? And here's another from the Trafford Data lab summarising subjects people have tweeted Trafford council about --- # R to SQL connection [NHS-R Community Webinar](https://nhsrcommunity.com/learn-r/workshops/database-connections-in-r-webinar) <img src="data:image/png;base64,#img/session01/webinar_nhsr.PNG"/> ??? you can also use R to make database connections, so things that you use SQL for currently, or cases where you export sql into a flat file for later analysis, could be made easier and more powerful by using R with existing databases and any SQL you're currently using. --- class: center, middle # Inclusivity <!-- --> ??? Another thing that's great about R is the community of R-Users that has grown up around it. So as you'll know, there is the excellent NHS-R Community, but there are also things like the minorities in R group, R Ladies, whole tons of special interest groups and communities with resources, tutorials and code libraries in whatever you're interested in. It's really one of the better things about R --- class: inverse, middle, center .left-col[.center[ ## Course Philosophy ]] ??? Next slide --- [Minimum Viable Product](https://blog.crisp.se/2016/01/25/henrikkniberg/making-sense-of-mvp) .left-col[.center[ <img src="data:image/png;base64,#img/session01/mvp.PNG" width="75%"/> ]] ??? Ok, so the idea with this course is not that we go though one subject in exhaustive detail, because that sort of thing only really works and comes together at the end. So not doing this like the top line, doing one thing at a time, which only makes sense at the end. The aim is to give you little self-contained chunks which are useful to you or at least give you a base understanding, and are stand-alone. --- class: center, middle # Course philosophy Relaxed and informal Slides and code are available on [<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> GitHub](https://github.com/nhs-r-community/intro_r) The truth; but it can’t be the whole truth… too much to cover in a day ??? today's going to be pretty informal and relaxed, you'll get the slides after the session as well. We're going to cover what we can in a long day, but we could spend a lifetime on this stuff, so today's just the first step on your journey. --- class: inverse, middle, center # Let's begin ??? Next slide --- class: center, middle ## R vs. RStudio R is a programming language RStudio is a software application with tools to improve your programming experience ??? So the first topic to cover is the difference between R and Rstudio. You'll hear people using both terms, so we'll just spend a minute explaining what each is --- <img class="center" src="data:image/png;base64,#img/session01/race_car.PNG"/> ??? R is the underlying programming language, so you can think of it like the 'engine' or the fundamental structure of a car. It's what is required to get things going, but if that's all you had, you might find it a bit unfriendly or uncomfortable to work with, like the stripped down car in this picture --- <img class="center" src="data:image/png;base64,#img/session01/roberto-nickson-unsplash.PNG"/> ??? RStudio is a piece of software that makes working with R much easier. So to continue the analogy, you can think of RStudio as like the dashboard, drivers seat and plush interior of the car. It won't work without R doing all the work in the background, but it makes everything more comfortable and easier. The dashboard will give you additional information about what's going on with the car, and the controls will allow you to change how the car works in a nice, easy way. --- # RStudio Many excellent features to help you with your analyses. Never again have to think about R and RStudio as separate: Opening R Studio opens an R session. Analogy from the book Modern Dive: [www.moderndive.com](www.moderndive.com) ??? So, when you load up RStudio, it will load an R session in the background, and you don't have to worry about the user-unfriendliness of interacting directly with R --- class: center, middle # Open RStudio R Studio opens an R session ??? Ok, we're nearly at the point of letting everybody loose in RStudio. Can everyone try and open Rstudio and see if you can get to something that looks like this... (next slide). If not, shout up. --- .left-column[ The Console is your window to R. You can code directly in the console… pi*2 <kbd> Enter </kbd> 37/12 <kbd> Enter <kbd> … but there is a better way… ] .right-column[ <img src="data:image/png;base64,#img/session01/rstudio_console.PNG"/> ] ??? When you first start Rstudio, you'll see the console window on the left. You can type R code directly in here and it will be evaluated instantly. You can try this in a second with some basic arithmetic, and see how it calculates. However, the way you really want to be working is to use R script files.... --- .left-column[ #### The Editor If you don’t see the Editor pane, click top right button And choose "R Script" from the drop down. Or, shortcut: <kbd> Ctrl + Shift + N </kbd> .blue[The cloud shortcut is <kbd> Ctrl + Shift + Alt + N </kbd>] ] .pull-right[ <img src="data:image/png;base64,#img/session01/new_file.PNG"/> ] ??? So to open an R Script file, you can click the little plus document button in the top left of RStudio and then "R Script", or alternatively control shift N if you're using RStudio Desktop, or Control-Shift Alt N on Rstudio cloud. --- .left-column[ The Editor is just like any other text editor: you can copy, paste, and save text. <kbd> Ctrl + Z </kbd> undoes *but* <kbd> Ctrl + Shift + Z </kbd> to redo (.blue[not <kbd> Ctrl + y</kbd>]) Different text is coloured (the console is uniform) Autocomplete <kbd>Ctrl + Enter </kbd> (sends line of code to Console) ] .right-column[ <img src="data:image/png;base64,#img/session01/editor.PNG"/> ] ??? You can copy, paste, and save the file in the editor window. It also has the advantage of things like color coding the text, giving you autocomplete suggestions and allowing you to run code in chunks, rather than typing sequentially at the console. --- .left-column[ Comment code with a # E.g # this was a bad idea Comment frequently, at least in the beginning ] .right-column[ <img src="data:image/png;base64,#img/session01/editor.PNG"/> ] ??? You can also write notes in the code by prefixing a line with the hash symbol - it's good practice to do this, so you can remind yourself and others why you are doing each step. Nothing more frustrating than wondering why you did something 2 months ago and having no notes as to why. So I would encourage you to do this. OK. So we'll pause here as we've got to the exciting bit of having a go yourselves. I'll open RStudio myself, and we'll all have a little play around and familiarise ourselves with the layout. Try to load RStudio (everyone with me) Can everyone see something like this? If so, try some arithmetic at the console like this, Then create an R Script file, add some comments. Execute the file by selecting and clicking 'run', or clicking 'source' run through problems.... Ok, hopefully that's ok, going to move on now, so I'll go back to the slides --- # Tools -> Options [Reasons why this is default](https://community.rstudio.com/t/defaults-of-saving-and-restoring-workspace/939) <img class="center" src="data:image/png;base64,#img/session01/global_options.PNG"/> ??? On the menu under Tools/global options are lots of options for working in RStudio. I'm going to suggest unticking the box marked 'Restore .RData into workspace at startup' and then clicking ok. This means you'll start R 'fresh' each time without things from your old analysis saved in the workspace. This is good practice as your script should be self-contained, and reduces the risk of errors being missed because you've got old data or items from last time. --- # Tools -> Options Accessbility and comfort for all <img class="center" src="data:image/png;base64,#img/session01/appearance.PNG"/> ??? Also under global options is an appearance section, which will allow you to change the theme - so if you like a dark mode, or you want to use different colours to the default ones, have a play around here and find something you like. --- class: center, middle # Packages ### R packages are like apps for your phone: -- Extend the capabilities of the basic or "base R" with extra functions, datasets, documentation. ??? Next thing to cover is the idea of R Packages. You can think of these as like little 'apps' like you have on your phone. They're little self-contained things that help you to do what you need to do They will augment what you're doing with either extra functions, extra datasets, or extra documentation. These will save you time and allow you to do things that aren't easy to do yourself. So rather than write code to do something yourself, you might find that someone has already written something that you can use or build on, saving you time and allowing you to do things that you otherwise wouldn't have the time or the ability to do. I will typically use a number of packages for each R project, and you will probably find you do the same. There's no point reinventing the wheel. --- <img class="center" src="data:image/png;base64,#img/session01/app_analogy.PNG"/> ??? So if you were going to use an app on your phone, you'd follow a two step process. First you install the app, to get it onto your phone. You only have to do that once. When you want to use the app, you load it in. It's the same for R packages. You need to install the package the first time you want to use it, using the install.packages function, then each time you want to use it you load it up by using the library function. --- # Packages Quotations in R can be either "" or '' but cannot be mixed: ```r # either install.packages("tidyverse") # or install.packages('tidyverse') ``` will download a package to your personal library. Then: ```r library(tidyverse) ``` This tells R to load the package from your personal library and is .blue[needed for every new session/script] ??? This is the code you'll use. Can everyone try both these commands in the console, or write it an R Script and run the commands. (pause) Everyone ok? --- class: center, middle ## CRAN repository [Comprehensive R Archive Network](https://cran.r-project.org/web/packages/) 18,903 packages (April 2022). Free. Peer reviewed. (Manifold possibilities) eg. interactive graphics and dashboards, machine learning, mine twitter data, create PowerPoint docs, maps… ## GitHub Many useful packages in development or subject to a lot of change are not on CRAN and are available through GitHub including [{NHSRtheme}](https://github.com/nhs-r-community/NHSRtheme). These won't be peer reviewed. ## ROpenSci ROpenSci offer a peer reviewed ecosystem of R packages through GitHub including UKHSA's [{fingertipsR}](https://github.com/ropensci/fingertipsR) ??? Packages are stored in repositories, which are big archives of packages. Mostly you'll deal with the comprehensive R archive network, or CRAN, which has the most packages. Packages submitted here will have an element of peer review and checks on them. You'll find all sorts of packages, from visualisation ones, through to machine learning or modelling, text mining or NLP, ones that create office documents, geospatial ones, all sorts. --- <img class="center" src="data:image/png;base64,#img/session01/tidyverse.PNG"/> ??? I'm going to spend a little time discussing one suite of packages called the tidyverse. --- class: center, middle ## What is the tidyverse? The [tidyverse package](https://www.tidyverse.org/) collects (some of) the most popular R packages into one. All have the same underlying principles: Provide simple tools (with consistent structure) which may be used together to help solve complex problems. ??? The tidyverse is a group of packages which work really well together, and make common tasks very much easier, so you'll find you use it almost universally for working in R. --- class: center, middle ## What is the tidyverse? During the workshop we will use the [ggplot2](https://ggplot2.tidyverse.org/), [dplyr](https://dplyr.tidyverse.org/), and [readr](https://readr.tidyverse.org/) packages. These are bundled up in the tidyverse package. Load it by running: ```r library(tidyverse) ``` ??? In the workshop we're going to have a look at ggplot2 for drawing graphs, readr for importing data, and dplyr for manipulating data. If you install and load the tidyverse, you should see the following (next slide) --- # Output Information - what was loaded and potential conflicts ``` -- Attaching packages ------------------------------------------ tidyverse 1.3.0 -- v ggplot2 3.3.3 v purrr 0.3.4 v tibble 3.1.0 v dplyr 1.0.4 v tidyr 1.1.2 v stringr 1.4.0 v readr 1.4.0 v forcats 0.5.1 -- Conflicts --------------------------------------------- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() ``` Warnings - not errors, just information on versions ``` Warning messages: 1: package ‘tidyverse’ was built under R version 4.0.4 2: package ‘tidyr’ was built under R version 4.0.3 3: package ‘readr’ was built under R version 4.0.3 4: package ‘purrr’ was built under R version 4.0.3 5: package ‘dplyr’ was built under R version 4.0.3 6: package ‘stringr’ was built under R version 4.0.3 ``` ??? Once you've installed it via the install.packages function, then use the library function to load it in, you'll get some information about which versions are loaded, and where function names are overwritten. You don't need to worry too much about this at this stage. Dependent on the version of R you have, you also might get some warning messages like here. Again, these are advisory warnings so don't stop anything running. Don't worry about them if you see them. Has everyone got something similar to the above? Does anyone have any questions? If not, second sesion before the break is about RStudio projects --- #### This work is licensed as Creative Commons Attribution ShareAlike 4.0 International To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/ --- name: goodbye class: middle, inverse # **Thank you!** Acknowledgements: for creating the original training slides and delivering training: Andrew Jones | Ozayr Mohammed Healthcare Analysts | The Strategy Unit [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> andrew.jones40@nhs.net](mailto:andrew.jones40@nhs.net) [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> ozayr.mohammed@nhs.net](mailto:ozayr.mohammed@nhs.net) And to Silvia Canelón who created the Xaringan presentation using NHS and NHR-R colour branding and shared this at the 2020 [NHS-R Community conference](https://spcanelon.github.io/xaringan-basics-and-beyond/index.html). Details of the workshops she ran at the [NHS-R Community conference](https://spcanelon.github.io/xaringan-basics-and-beyond/index.html). And to Zoë Turner who converted the presentation to Xaringan. [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M440 6.5L24 246.4c-34.4 19.9-31.1 70.8 5.7 85.9L144 379.6V464c0 46.4 59.2 65.5 86.6 28.6l43.8-59.1 111.9 46.2c5.9 2.4 12.1 3.6 18.3 3.6 8.2 0 16.3-2.1 23.6-6.2 12.8-7.2 21.6-20 23.9-34.5l59.4-387.2c6.1-40.1-36.9-68.8-71.5-48.9zM192 464v-64.6l36.6 15.1L192 464zm212.6-28.7l-153.8-63.5L391 169.5c10.7-15.5-9.5-33.5-23.7-21.2L155.8 332.6 48 288 464 48l-59.4 387.3z"></path></svg> zoe.turner2@notthshc.nhs.uk](mailto:zoe.turner2@nottshc.nhs.uk)